10. Cross-Validation and Feature Importance
Heading
Cross-Validation and Feature Importance
ND320 C4 L3 12 Nested Cross-Validation
Nested Cross Validation Summary
Using the Nested Cross Validation technique, we'd ideally pick the best hyperparameters on a subset of the data, and then evaluate it on a hold-out set which is similar to train-validation-test set split but we don't have enough data to do so. When you don't have enough data to separate your dataset into 3 parts, we can nest the hyperparameter selection in another layer of cross-validation.
We then walked through how to actually apply this technique on our dataset. Our performance dropped because we are now not overfitting our hyperparameters when we evaluate model performance.
ND320 C4 L3 13 Feature Importance
Summary
We have just learned that another way to regularize our model and increase performance (besides reducing the tree depth) is to reduce the number of features we use. The RandomForestClassifier
can tell us how important the features are in classifying the data. We found the 10 most important features determined by the RandomForestClassifier
and trained the model on just those 10 features. The trained model no longer misclassified bike
as walk
and this improved our classifier performance by 15%, just by picking the most important features!